Aminet 1

home *** CD-ROM | disk | FTP | other *** search

/ Aminet 1 / Aminet - June 1993 [Walnut Creek].iso / usenet / sources / volume90 / unix / flex_2_3 / part04 < prev next >

Wrap

Internet Message Format | 1990-08-19 | 49.0 KB

Path: abcfd20.larc.nasa.gov!amiga-request From: amiga-request@abcfd20.larc.nasa.gov (Amiga Sources/Binaries Moderator) Subject: v90i231: flex 2.3 - fast lexical analyzer generator, Part04/13 Reply-To: loftus@wpllabs.uucp (William P Loftus) Newsgroups: comp.sources.amiga Message-ID: <comp.sources.amiga:v90i231@abcfd20.larc.nasa.gov> Date: 19 Aug 90 22:42:25 GMT Approved: tadguy@uunet.UU.NET (Tad Guy) X-Mail-Submissions-To: amiga@uunet.uu.net X-Post-Discussions-To: comp.sys.amiga Submitted-by: loftus@wpllabs.uucp (William P Loftus) Posting-number: Volume 90, Issue 231 Archive-name: unix/flex-2.3/part04 #!/bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh <file", e.g.. If this archive is complete, you # will see the following message at the end: # "End of archive 4 (of 13)." # Contents: flex.1 tblcmp.c # Wrapped by tadguy@abcfd20 on Sun Aug 19 18:41:43 1990 PATH=/bin:/usr/bin:/usr/ucb ; export PATH if test -f 'flex.1' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'flex.1'\" else echo shar: Extracting \"'flex.1'\" $20799 characters$ sed "s/^X//" >'flex.1' <<'END_OF_FILE' X.TH FLEX 1 "26 May 1990" "Version 2.3" X.SH NAME Xflex - fast lexical analyzer generator X.SH SYNOPSIS X.B flex X.B [-bcdfinpstvFILT8 -C[efmF] -Sskeleton] X.I [filename ...] X.SH DESCRIPTION X.I flex Xis a tool for generating X.I scanners: Xprograms which recognized lexical patterns in text. X.I flex Xreads Xthe given input files, or its standard input if no file names are given, Xfor a description of a scanner to generate. The description is in Xthe form of pairs Xof regular expressions and C code, called X.I rules. flex Xgenerates as output a C source file, X.B lex.yy.c, Xwhich defines a routine X.B yylex(). XThis file is compiled and linked with the X.B -lfl Xlibrary to produce an executable. When the executable is run, Xit analyzes its input for occurrences Xof the regular expressions. Whenever it finds one, it executes Xthe corresponding C code. X.LP XFor full documentation, see X.B flexdoc(1). XThis manual entry is intended for use as a quick reference. X.SH OPTIONS X.I flex Xhas the following options: X.TP X.B -b XGenerate backtracking information to X.I lex.backtrack. XThis is a list of scanner states which require backtracking Xand the input characters on which they do so. By adding rules one Xcan remove backtracking states. If all backtracking states Xare eliminated and X.B -f Xor X.B -F Xis used, the generated scanner will run faster. X.TP X.B -c Xis a do-nothing, deprecated option included for POSIX compliance. X.IP X.B NOTE: Xin previous releases of X.I flex X.B -c Xspecified table-compression options. This functionality is Xnow given by the X.B -C Xflag. To ease the the impact of this change, when X.I flex Xencounters X.B -c, Xit currently issues a warning message and assumes that X.B -C Xwas desired instead. In the future this "promotion" of X.B -c Xto X.B -C Xwill go away in the name of full POSIX compliance (unless Xthe POSIX meaning is removed first). X.TP X.B -d Xmakes the generated scanner run in X.I debug Xmode. Whenever a pattern is recognized and the global X.B yy_flex_debug Xis non-zero (which is the default), the scanner will Xwrite to X.I stderr Xa line of the form: X.nf X X --accepting rule at line 53 ("the matched text") X X.fi XThe line number refers to the location of the rule in the file Xdefining the scanner (i.e., the file that was fed to flex). Messages Xare also generated when the scanner backtracks, accepts the Xdefault rule, reaches the end of its input buffer (or encounters Xa NUL; the two look the same as far as the scanner's concerned), Xor reaches an end-of-file. X.TP X.B -f Xspecifies (take your pick) X.I full table Xor X.I fast scanner. XNo table compression is done. The result is large but fast. XThis option is equivalent to X.B -Cf X(see below). X.TP X.B -i Xinstructs X.I flex Xto generate a X.I case-insensitive Xscanner. The case of letters given in the X.I flex Xinput patterns will Xbe ignored, and tokens in the input will be matched regardless of case. The Xmatched text given in X.I yytext Xwill have the preserved case (i.e., it will not be folded). X.TP X.B -n Xis another do-nothing, deprecated option included only for XPOSIX compliance. X.TP X.B -p Xgenerates a performance report to stderr. The report Xconsists of comments regarding features of the X.I flex Xinput file which will cause a loss of performance in the resulting scanner. X.TP X.B -s Xcauses the X.I default rule X(that unmatched scanner input is echoed to X.I stdout) Xto be suppressed. If the scanner encounters input that does not Xmatch any of its rules, it aborts with an error. X.TP X.B -t Xinstructs X.I flex Xto write the scanner it generates to standard output instead Xof X.B lex.yy.c. X.TP X.B -v Xspecifies that X.I flex Xshould write to X.I stderr Xa summary of statistics regarding the scanner it generates. X.TP X.B -F Xspecifies that the X.ul Xfast Xscanner table representation should be used. This representation is Xabout as fast as the full table representation X.ul X(-f), Xand for some sets of patterns will be considerably smaller (and for Xothers, larger). See X.B flexdoc(1) Xfor details. X.IP XThis option is equivalent to X.B -CF X(see below). X.TP X.B -I Xinstructs X.I flex Xto generate an X.I interactive Xscanner, that is, a scanner which stops immediately rather than Xlooking ahead if it knows Xthat the currently scanned text cannot be part of a longer rule's match. XAgain, see X.B flexdoc(1) Xfor details. X.IP XNote, X.B -I Xcannot be used in conjunction with X.I full Xor X.I fast tables, Xi.e., the X.B -f, -F, -Cf, Xor X.B -CF Xflags. X.TP X.B -L Xinstructs X.I flex Xnot to generate X.B #line Xdirectives in X.B lex.yy.c. XThe default is to generate such directives so error Xmessages in the actions will be correctly Xlocated with respect to the original X.I flex Xinput file, and not to Xthe fairly meaningless line numbers of X.B lex.yy.c. X.TP X.B -T Xmakes X.I flex Xrun in X.I trace Xmode. It will generate a lot of messages to X.I stdout Xconcerning Xthe form of the input and the resultant non-deterministic and deterministic Xfinite automata. This option is mostly for use in maintaining X.I flex. X.TP X.B -8 Xinstructs X.I flex Xto generate an 8-bit scanner. XOn some sites, this is the default. On others, the default Xis 7-bit characters. To see which is the case, check the verbose X.B (-v) Xoutput for "equivalence classes created". If the denominator of Xthe number shown is 128, then by default X.I flex Xis generating 7-bit characters. If it is 256, then the default is X8-bit characters. X.TP X.B -C[efmF] Xcontrols the degree of table compression. X.IP X.B -Ce Xdirects X.I flex Xto construct X.I equivalence classes, Xi.e., sets of characters Xwhich have identical lexical properties. XEquivalence classes usually give Xdramatic reductions in the final table/object file sizes (typically Xa factor of 2-5) and are pretty cheap performance-wise (one array Xlook-up per character scanned). X.IP X.B -Cf Xspecifies that the X.I full Xscanner tables should be generated - X.I flex Xshould not compress the Xtables by taking advantages of similar transition functions for Xdifferent states. X.IP X.B -CF Xspecifies that the alternate fast scanner representation (described in X.B flexdoc(1)) Xshould be used. X.IP X.B -Cm Xdirects X.I flex Xto construct X.I meta-equivalence classes, Xwhich are sets of equivalence classes (or characters, if equivalence Xclasses are not being used) that are commonly used together. Meta-equivalence Xclasses are often a big win when using compressed tables, but they Xhave a moderate performance impact (one or two "if" tests and one Xarray look-up per character scanned). X.IP XA lone X.B -C Xspecifies that the scanner tables should be compressed but neither Xequivalence classes nor meta-equivalence classes should be used. X.IP XThe options X.B -Cf Xor X.B -CF Xand X.B -Cm Xdo not make sense together - there is no opportunity for meta-equivalence Xclasses if the table is not being compressed. Otherwise the options Xmay be freely mixed. X.IP XThe default setting is X.B -Cem, Xwhich specifies that X.I flex Xshould generate equivalence classes Xand meta-equivalence classes. This setting provides the highest Xdegree of table compression. You can trade off Xfaster-executing scanners at the cost of larger tables with Xthe following generally being true: X.nf X X slowest & smallest X -Cem X -Cm X -Ce X -C X -C{f,F}e X -C{f,F} X fastest & largest X X.fi X.IP X.B -C Xoptions are not cumulative; whenever the flag is encountered, the Xprevious -C settings are forgotten. X.TP X.B -Sskeleton_file Xoverrides the default skeleton file from which X.I flex Xconstructs its scanners. You'll never need this option unless you are doing X.I flex Xmaintenance or development. X.SH SUMMARY OF FLEX REGULAR EXPRESSIONS XThe patterns in the input are written using an extended set of regular Xexpressions. These are: X.nf X X x match the character 'x' X . any character except newline X [xyz] a "character class"; in this case, the pattern X matches either an 'x', a 'y', or a 'z' X [abj-oZ] a "character class" with a range in it; matches X an 'a', a 'b', any letter from 'j' through 'o', X or a 'Z' X [^A-Z] a "negated character class", i.e., any character X but those in the class. In this case, any X character EXCEPT an uppercase letter. X [^A-Z\\n] any character EXCEPT an uppercase letter or X a newline X r* zero or more r's, where r is any regular expression X r+ one or more r's X r? zero or one r's (that is, "an optional r") X r{2,5} anywhere from two to five r's X r{2,} two or more r's X r{4} exactly 4 r's X {name} the expansion of the "name" definition X (see above) X "[xyz]\\"foo" X the literal string: [xyz]"foo X \\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v', X then the ANSI-C interpretation of \\x. X Otherwise, a literal 'X' (used to escape X operators such as '*') X \\123 the character with octal value 123 X \\x2a the character with hexadecimal value 2a X (r) match an r; parentheses are used to override X precedence (see below) X X X rs the regular expression r followed by the X regular expression s; called "concatenation" X X X r|s either an r or an s X X X r/s an r but only if it is followed by an s. The X s is not part of the matched text. This type X of pattern is called as "trailing context". X ^r an r, but only at the beginning of a line X r$ an r, but only at the end of a line. Equivalent X to "r/\\n". X X X <s>r an r, but only in start condition s (see X below for discussion of start conditions) X <s1,s2,s3>r X same, but in any of start conditions s1, X s2, or s3 X X X <<EOF>> an end-of-file X <s1,s2><<EOF>> X an end-of-file when in start condition s1 or s2 X X.fi XThe regular expressions listed above are grouped according to Xprecedence, from highest precedence at the top to lowest at the bottom. XThose grouped together have equal precedence. X.LP XSome notes on patterns: X.IP - XNegated character classes X.I match newlines Xunless "\\n" (or an equivalent escape sequence) is one of the Xcharacters explicitly present in the negated character class X(e.g., "[^A-Z\\n]"). X.IP - XA rule can have at most one instance of trailing context (the '/' operator Xor the '$' operator). The start condition, '^', and "<<EOF>>" patterns Xcan only occur at the beginning of a pattern, and, as well as with '/' and '$', Xcannot be grouped inside parentheses. The following are all illegal: X.nf X X foo/bar$ X foo|(bar$) X foo|^bar X <sc1>foo<sc2>bar X X.fi X.SH SUMMARY OF SPECIAL ACTIONS XIn addition to arbitrary C code, the following can appear in actions: X.IP - X.B ECHO Xcopies yytext to the scanner's output. X.IP - X.B BEGIN Xfollowed by the name of a start condition places the scanner in the Xcorresponding start condition. X.IP - X.B REJECT Xdirects the scanner to proceed on to the "second best" rule which matched the Xinput (or a prefix of the input). X.B yytext Xand X.B yyleng Xare set up appropriately. Note that X.B REJECT Xis a particularly expensive feature in terms scanner performance; Xif it is used in X.I any Xof the scanner's actions it will slow down X.I all Xof the scanner's matching. Furthermore, X.B REJECT Xcannot be used with the X.I -f Xor X.I -F Xoptions. X.IP XNote also that unlike the other special actions, X.B REJECT Xis a X.I branch; Xcode immediately following it in the action will X.I not Xbe executed. X.IP - X.B yymore() Xtells the scanner that the next time it matches a rule, the corresponding Xtoken should be X.I appended Xonto the current value of X.B yytext Xrather than replacing it. X.IP - X.B yyless(n) Xreturns all but the first X.I n Xcharacters of the current token back to the input stream, where they Xwill be rescanned when the scanner looks for the next match. X.B yytext Xand X.B yyleng Xare adjusted appropriately (e.g., X.B yyleng Xwill now be equal to X.I n X). X.IP - X.B unput(c) Xputs the character X.I c Xback onto the input stream. It will be the next character scanned. X.IP - X.B input() Xreads the next character from the input stream (this routine is called X.B yyinput() Xif the scanner is compiled using X.B C++). X.IP - X.B yyterminate() Xcan be used in lieu of a return statement in an action. It terminates Xthe scanner and returns a 0 to the scanner's caller, indicating "all done". X.IP XBy default, X.B yyterminate() Xis also called when an end-of-file is encountered. It is a macro and Xmay be redefined. X.IP - X.B YY_NEW_FILE Xis an action available only in <<EOF>> rules. It means "Okay, I've Xset up a new input file, continue scanning". X.IP - X.B yy_create_buffer( file, size ) Xtakes a X.I FILE Xpointer and an integer X.I size. XIt returns a YY_BUFFER_STATE Xhandle to a new input buffer large enough to accomodate X.I size Xcharacters and associated with the given file. When in doubt, use X.B YY_BUF_SIZE Xfor the size. X.IP - X.B yy_switch_to_buffer( new_buffer ) Xswitches the scanner's processing to scan for tokens from Xthe given buffer, which must be a YY_BUFFER_STATE. X.IP - X.B yy_delete_buffer( buffer ) Xdeletes the given buffer. X.SH VALUES AVAILABLE TO THE USER X.IP - X.B char *yytext Xholds the text of the current token. It may not be modified. X.IP - X.B int yyleng Xholds the length of the current token. It may not be modified. X.IP - X.B FILE *yyin Xis the file which by default X.I flex Xreads from. It may be redefined but doing so only makes sense before Xscanning begins. Changing it in the middle of scanning will have Xunexpected results since X.I flex Xbuffers its input. Once scanning terminates because an end-of-file Xhas been seen, X.B Xvoid yyrestart( FILE *new_file ) Xmay be called to point X.I yyin Xat the new input file. X.IP - X.B FILE *yyout Xis the file to which X.B ECHO Xactions are done. It can be reassigned by the user. X.IP - X.B YY_CURRENT_BUFFER Xreturns a X.B YY_BUFFER_STATE Xhandle to the current buffer. X.SH MACROS THE USER CAN REDEFINE X.IP - X.B YY_DECL Xcontrols how the scanning routine is declared. XBy default, it is "int yylex()", or, if prototypes are being Xused, "int yylex(void)". This definition may be changed by redefining Xthe "YY_DECL" macro. Note that Xif you give arguments to the scanning routine using a XK&R-style/non-prototyped function declaration, you must terminate Xthe definition with a semi-colon (;). X.IP - XThe nature of how the scanner Xgets its input can be controlled by redefining the X.B YY_INPUT Xmacro. XYY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its Xaction is to place up to X.I max_size Xcharacters in the character array X.I buf Xand return in the integer variable X.I result Xeither the Xnumber of characters read or the constant YY_NULL (0 on Unix systems) Xto indicate EOF. The default YY_INPUT reads from the Xglobal file-pointer "yyin". XA sample redefinition of YY_INPUT (in the definitions Xsection of the input file): X.nf X X %{ X #undef YY_INPUT X #define YY_INPUT(buf,result,max_size) \\ X { \\ X int c = getchar(); \\ X result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\ X } X %} X X.fi X.IP - XWhen the scanner receives an end-of-file indication from YY_INPUT, Xit then checks the X.B yywrap() Xfunction. If X.B yywrap() Xreturns false (zero), then it is assumed that the Xfunction has gone ahead and set up X.I yyin Xto point to another input file, and scanning continues. If it returns Xtrue (non-zero), then the scanner terminates, returning 0 to its Xcaller. X.IP XThe default X.B yywrap() Xalways returns 1. Presently, to redefine it you must first X"#undef yywrap", as it is currently implemented as a macro. It is Xlikely that X.B yywrap() Xwill soon be defined to be a function rather than a macro. X.IP - XYY_USER_ACTION Xcan be redefined to provide an action Xwhich is always executed prior to the matched rule's action. X.IP - XThe macro X.B YY_USER_INIT Xmay be redefined to provide an action which is always executed before Xthe first scan. X.IP - XIn the generated scanner, the actions are all gathered in one large Xswitch statement and separated using X.B YY_BREAK, Xwhich may be redefined. By default, it is simply a "break", to separate Xeach rule's action from the following rule's. X.SH FILES X.TP X.I flex.skel Xskeleton scanner. X.TP X.I lex.yy.c Xgenerated scanner (called X.I lexyy.c Xon some systems). X.TP X.I lex.backtrack Xbacktracking information for X.B -b Xflag (called X.I lex.bck Xon some systems). X.TP X.B -lfl Xlibrary with which to link the scanners. X.SH "SEE ALSO" X.LP Xflexdoc(1), lex(1), yacc(1), sed(1), awk(1). X.LP XM. E. Lesk and E. Schmidt, X.I LEX - Lexical Analyzer Generator X.SH DIAGNOSTICS X.I reject_used_but_not_detected undefined Xor X.LP X.I yymore_used_but_not_detected undefined - XThese errors can occur at compile time. They indicate that the Xscanner uses X.B REJECT Xor X.B yymore() Xbut that X.I flex Xfailed to notice the fact, meaning that X.I flex Xscanned the first two sections looking for occurrences of these actions Xand failed to find any, but somehow you snuck some in (via a #include Xfile, for example). Make an explicit reference to the action in your X.I flex Xinput file. (Note that previously X.I flex Xsupported a X.B %used/%unused Xmechanism for dealing with this problem; this feature is still supported Xbut now deprecated, and will go away soon unless the author hears from Xpeople who can argue compellingly that they need it.) X.LP X.I flex scanner jammed - Xa scanner compiled with X.B -s Xhas encountered an input string which wasn't matched by Xany of its rules. X.LP X.I flex input buffer overflowed - Xa scanner rule matched a string long enough to overflow the Xscanner's internal input buffer (16K bytes - controlled by X.B YY_BUF_MAX Xin "flex.skel"). X.LP X.I scanner requires -8 flag - XYour scanner specification includes recognizing 8-bit characters and Xyou did not specify the -8 flag (and your site has not installed flex Xwith -8 as the default). X.LP X.I Xfatal flex scanner internal error--end of buffer missed - XThis can occur in an scanner which is reentered after a long-jump Xhas jumped out (or over) the scanner's activation frame. Before Xreentering the scanner, use: X.nf X X yyrestart( yyin ); X X.fi X.LP X.I too many %t classes! - XYou managed to put every single character into its own %t class. X.I flex Xrequires that at least one of the classes share characters. X.SH AUTHOR XVern Paxson, with the help of many ideas and much inspiration from XVan Jacobson. Original version by Jef Poskanzer. X.LP XSee flexdoc(1) for additional credits and the address to send comments to. X.SH DEFICIENCIES / BUGS X.LP XSome trailing context Xpatterns cannot be properly matched and generate Xwarning messages ("Dangerous trailing context"). These are Xpatterns where the ending of the Xfirst part of the rule matches the beginning of the second Xpart, such as "zx*/xy*", where the 'x*' matches the 'x' at Xthe beginning of the trailing context. (Note that the POSIX draft Xstates that the text matched by such patterns is undefined.) X.LP XFor some trailing context rules, parts which are actually fixed-length are Xnot recognized as such, leading to the abovementioned performance loss. XIn particular, parts using '|' or {n} (such as "foo{3}") are always Xconsidered variable-length. X.LP XCombining trailing context with the special '|' action can result in X.I fixed Xtrailing context being turned into the more expensive X.I variable Xtrailing context. For example, this happens in the following example: X.nf X X %% X abc | X xyz/def X X.fi X.LP XUse of unput() invalidates yytext and yyleng. X.LP XUse of unput() to push back more text than was matched can Xresult in the pushed-back text matching a beginning-of-line ('^') Xrule even though it didn't come at the beginning of the line X(though this is rare!). X.LP XPattern-matching of NUL's is substantially slower than matching other Xcharacters. X.LP X.I flex Xdoes not generate correct #line directives for code internal Xto the scanner; thus, bugs in X.I flex.skel Xyield bogus line numbers. X.LP XDue to both buffering of input and read-ahead, you cannot intermix Xcalls to <stdio.h> routines, such as, for example, X.B getchar(), Xwith X.I flex Xrules and expect it to work. Call X.B input() Xinstead. X.LP XThe total table entries listed by the X.B -v Xflag excludes the number of table entries needed to determine Xwhat rule has been matched. The number of entries is equal Xto the number of DFA states if the scanner does not use X.B REJECT, Xand somewhat greater than the number of states if it does. X.LP X.B REJECT Xcannot be used with the X.I -f Xor X.I -F Xoptions. X.LP XSome of the macros, such as X.B yywrap(), Xmay in the future become functions which live in the X.B -lfl Xlibrary. This will doubtless break a lot of code, but may be Xrequired for POSIX-compliance. X.LP XThe X.I flex Xinternal algorithms need documentation. END_OF_FILE if test 20799 -ne `wc -c <'flex.1'`; then echo shar: \"'flex.1'\" unpacked with wrong size! fi # end of 'flex.1' fi if test -f 'tblcmp.c' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'tblcmp.c'\" else echo shar: Extracting \"'tblcmp.c'\" $25169 characters$ sed "s/^X//" >'tblcmp.c' <<'END_OF_FILE' X/* tblcmp - table compression routines */ X X/*- X * Copyright (c) 1990 The Regents of the University of California. X * All rights reserved. X * X * This code is derived from software contributed to Berkeley by X * Vern Paxson. X * X * The United States Government has rights in this work pursuant X * to contract no. DE-AC03-76SF00098 between the United States X * Department of Energy and the University of California. X * X * Redistribution and use in source and binary forms are permitted provided X * that: (1) source distributions retain this entire copyright notice and X * comment, and (2) distributions including binaries display the following X * acknowledgement: ``This product includes software developed by the X * University of California, Berkeley and its contributors'' in the X * documentation or other materials provided with the distribution and in X * all advertising materials mentioning features or use of this software. X * Neither the name of the University nor the names of its contributors may X * be used to endorse or promote products derived from this software without X * specific prior written permission. X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED X * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF X * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. X */ X X#ifndef lint Xstatic char rcsid[] = X "@(#) $Header: /usr/fsys/odin/a/vern/flex/RCS/tblcmp.c,v 2.5 90/06/27 23:48:38 vern Exp $ (LBL)"; X#endif X X#include "flexdef.h" X X X/* declarations for functions that have forward references */ X Xvoid mkentry PROTO((register int*, int, int, int, int)); Xvoid mkprot PROTO((int[], int, int)); Xvoid mktemplate PROTO((int[], int, int)); Xvoid mv2front PROTO((int)); Xint tbldiff PROTO((int[], int, int[])); X X X/* bldtbl - build table entries for dfa state X * X * synopsis X * int state[numecs], statenum, totaltrans, comstate, comfreq; X * bldtbl( state, statenum, totaltrans, comstate, comfreq ); X * X * State is the statenum'th dfa state. It is indexed by equivalence class and X * gives the number of the state to enter for a given equivalence class. X * totaltrans is the total number of transitions out of the state. Comstate X * is that state which is the destination of the most transitions out of State. X * Comfreq is how many transitions there are out of State to Comstate. X * X * A note on terminology: X * "protos" are transition tables which have a high probability of X * either being redundant (a state processed later will have an identical X * transition table) or nearly redundant (a state processed later will have X * many of the same out-transitions). A "most recently used" queue of X * protos is kept around with the hope that most states will find a proto X * which is similar enough to be usable, and therefore compacting the X * output tables. X * "templates" are a special type of proto. If a transition table is X * homogeneous or nearly homogeneous (all transitions go to the same X * destination) then the odds are good that future states will also go X * to the same destination state on basically the same character set. X * These homogeneous states are so common when dealing with large rule X * sets that they merit special attention. If the transition table were X * simply made into a proto, then (typically) each subsequent, similar X * state will differ from the proto for two out-transitions. One of these X * out-transitions will be that character on which the proto does not go X * to the common destination, and one will be that character on which the X * state does not go to the common destination. Templates, on the other X * hand, go to the common state on EVERY transition character, and therefore X * cost only one difference. X */ X Xvoid bldtbl( state, statenum, totaltrans, comstate, comfreq ) Xint state[], statenum, totaltrans, comstate, comfreq; X X { X int extptr, extrct[2][CSIZE + 1]; X int mindiff, minprot, i, d; X int checkcom; X X /* If extptr is 0 then the first array of extrct holds the result of the X * "best difference" to date, which is those transitions which occur in X * "state" but not in the proto which, to date, has the fewest differences X * between itself and "state". If extptr is 1 then the second array of X * extrct hold the best difference. The two arrays are toggled X * between so that the best difference to date can be kept around and X * also a difference just created by checking against a candidate "best" X * proto. X */ X X extptr = 0; X X /* if the state has too few out-transitions, don't bother trying to X * compact its tables X */ X X if ( (totaltrans * 100) < (numecs * PROTO_SIZE_PERCENTAGE) ) X mkentry( state, numecs, statenum, JAMSTATE, totaltrans ); X X else X { X /* checkcom is true if we should only check "state" against X * protos which have the same "comstate" value X */ X X checkcom = comfreq * 100 > totaltrans * CHECK_COM_PERCENTAGE; X X minprot = firstprot; X mindiff = totaltrans; X X if ( checkcom ) X { X /* find first proto which has the same "comstate" */ X for ( i = firstprot; i != NIL; i = protnext[i] ) X if ( protcomst[i] == comstate ) X { X minprot = i; X mindiff = tbldiff( state, minprot, extrct[extptr] ); X break; X } X } X X else X { X /* since we've decided that the most common destination out X * of "state" does not occur with a high enough frequency, X * we set the "comstate" to zero, assuring that if this state X * is entered into the proto list, it will not be considered X * a template. X */ X comstate = 0; X X if ( firstprot != NIL ) X { X minprot = firstprot; X mindiff = tbldiff( state, minprot, extrct[extptr] ); X } X } X X /* we now have the first interesting proto in "minprot". If X * it matches within the tolerances set for the first proto, X * we don't want to bother scanning the rest of the proto list X * to see if we have any other reasonable matches. X */ X X if ( mindiff * 100 > totaltrans * FIRST_MATCH_DIFF_PERCENTAGE ) X { /* not a good enough match. Scan the rest of the protos */ X for ( i = minprot; i != NIL; i = protnext[i] ) X { X d = tbldiff( state, i, extrct[1 - extptr] ); X if ( d < mindiff ) X { X extptr = 1 - extptr; X mindiff = d; X minprot = i; X } X } X } X X /* check if the proto we've decided on as our best bet is close X * enough to the state we want to match to be usable X */ X X if ( mindiff * 100 > totaltrans * ACCEPTABLE_DIFF_PERCENTAGE ) X { X /* no good. If the state is homogeneous enough, we make a X * template out of it. Otherwise, we make a proto. X */ X X if ( comfreq * 100 >= totaltrans * TEMPLATE_SAME_PERCENTAGE ) X mktemplate( state, statenum, comstate ); X X else X { X mkprot( state, statenum, comstate ); X mkentry( state, numecs, statenum, JAMSTATE, totaltrans ); X } X } X X else X { /* use the proto */ X mkentry( extrct[extptr], numecs, statenum, X prottbl[minprot], mindiff ); X X /* if this state was sufficiently different from the proto X * we built it from, make it, too, a proto X */ X X if ( mindiff * 100 >= totaltrans * NEW_PROTO_DIFF_PERCENTAGE ) X mkprot( state, statenum, comstate ); X X /* since mkprot added a new proto to the proto queue, it's possible X * that "minprot" is no longer on the proto queue (if it happened X * to have been the last entry, it would have been bumped off). X * If it's not there, then the new proto took its physical place X * (though logically the new proto is at the beginning of the X * queue), so in that case the following call will do nothing. X */ X X mv2front( minprot ); X } X } X } X X X/* cmptmps - compress template table entries X * X * synopsis X * cmptmps(); X * X * template tables are compressed by using the 'template equivalence X * classes', which are collections of transition character equivalence X * classes which always appear together in templates - really meta-equivalence X * classes. until this point, the tables for templates have been stored X * up at the top end of the nxt array; they will now be compressed and have X * table entries made for them. X */ X Xvoid cmptmps() X X { X int tmpstorage[CSIZE + 1]; X register int *tmp = tmpstorage, i, j; X int totaltrans, trans; X X peakpairs = numtemps * numecs + tblend; X X if ( usemecs ) X { X /* create equivalence classes base on data gathered on template X * transitions X */ X X nummecs = cre8ecs( tecfwd, tecbck, numecs ); X } X X else X nummecs = numecs; X X if ( lastdfa + numtemps + 1 >= current_max_dfas ) X increase_max_dfas(); X X /* loop through each template */ X X for ( i = 1; i <= numtemps; ++i ) X { X totaltrans = 0; /* number of non-jam transitions out of this template */ X X for ( j = 1; j <= numecs; ++j ) X { X trans = tnxt[numecs * i + j]; X X if ( usemecs ) X { X /* the absolute value of tecbck is the meta-equivalence class X * of a given equivalence class, as set up by cre8ecs X */ X if ( tecbck[j] > 0 ) X { X tmp[tecbck[j]] = trans; X X if ( trans > 0 ) X ++totaltrans; X } X } X X else X { X tmp[j] = trans; X X if ( trans > 0 ) X ++totaltrans; X } X } X X /* it is assumed (in a rather subtle way) in the skeleton that X * if we're using meta-equivalence classes, the def[] entry for X * all templates is the jam template, i.e., templates never default X * to other non-jam table entries (e.g., another template) X */ X X /* leave room for the jam-state after the last real state */ X mkentry( tmp, nummecs, lastdfa + i + 1, JAMSTATE, totaltrans ); X } X } X X X X/* expand_nxt_chk - expand the next check arrays */ X Xvoid expand_nxt_chk() X X { X register int old_max = current_max_xpairs; X X current_max_xpairs += MAX_XPAIRS_INCREMENT; X X ++num_reallocs; X X nxt = reallocate_integer_array( nxt, current_max_xpairs ); X chk = reallocate_integer_array( chk, current_max_xpairs ); X X bzero( (char *) (chk + old_max), X MAX_XPAIRS_INCREMENT * sizeof( int ) / sizeof( char ) ); X } X X X/* find_table_space - finds a space in the table for a state to be placed X * X * synopsis X * int *state, numtrans, block_start; X * int find_table_space(); X * X * block_start = find_table_space( state, numtrans ); X * X * State is the state to be added to the full speed transition table. X * Numtrans is the number of out-transitions for the state. X * X * find_table_space() returns the position of the start of the first block (in X * chk) able to accommodate the state X * X * In determining if a state will or will not fit, find_table_space() must take X * into account the fact that an end-of-buffer state will be added at [0], X * and an action number will be added in [-1]. X */ X Xint find_table_space( state, numtrans ) Xint *state, numtrans; X X { X /* firstfree is the position of the first possible occurrence of two X * consecutive unused records in the chk and nxt arrays X */ X register int i; X register int *state_ptr, *chk_ptr; X register int *ptr_to_last_entry_in_state; X X /* if there are too many out-transitions, put the state at the end of X * nxt and chk X */ X if ( numtrans > MAX_XTIONS_FULL_INTERIOR_FIT ) X { X /* if table is empty, return the first available spot in chk/nxt, X * which should be 1 X */ X if ( tblend < 2 ) X return ( 1 ); X X i = tblend - numecs; /* start searching for table space near the X * end of chk/nxt arrays X */ X } X X else X i = firstfree; /* start searching for table space from the X * beginning (skipping only the elements X * which will definitely not hold the new X * state) X */ X X while ( 1 ) /* loops until a space is found */ X { X if ( i + numecs > current_max_xpairs ) X expand_nxt_chk(); X X /* loops until space for end-of-buffer and action number are found */ X while ( 1 ) X { X if ( chk[i - 1] == 0 ) /* check for action number space */ X { X if ( chk[i] == 0 ) /* check for end-of-buffer space */ X break; X X else X i += 2; /* since i != 0, there is no use checking to X * see if (++i) - 1 == 0, because that's the X * same as i == 0, so we skip a space X */ X } X X else X ++i; X X if ( i + numecs > current_max_xpairs ) X expand_nxt_chk(); X } X X /* if we started search from the beginning, store the new firstfree for X * the next call of find_table_space() X */ X if ( numtrans <= MAX_XTIONS_FULL_INTERIOR_FIT ) X firstfree = i + 1; X X /* check to see if all elements in chk (and therefore nxt) that are X * needed for the new state have not yet been taken X */ X X state_ptr = &state[1]; X ptr_to_last_entry_in_state = &chk[i + numecs + 1]; X X for ( chk_ptr = &chk[i + 1]; chk_ptr != ptr_to_last_entry_in_state; X ++chk_ptr ) X if ( *(state_ptr++) != 0 && *chk_ptr != 0 ) X break; X X if ( chk_ptr == ptr_to_last_entry_in_state ) X return ( i ); X X else X ++i; X } X } X X X/* inittbl - initialize transition tables X * X * synopsis X * inittbl(); X * X * Initializes "firstfree" to be one beyond the end of the table. Initializes X * all "chk" entries to be zero. Note that templates are built in their X * own tbase/tdef tables. They are shifted down to be contiguous X * with the non-template entries during table generation. X */ Xvoid inittbl() X X { X register int i; X X bzero( (char *) chk, current_max_xpairs * sizeof( int ) / sizeof( char ) ); X X tblend = 0; X firstfree = tblend + 1; X numtemps = 0; X X if ( usemecs ) X { X /* set up doubly-linked meta-equivalence classes X * these are sets of equivalence classes which all have identical X * transitions out of TEMPLATES X */ X X tecbck[1] = NIL; X X for ( i = 2; i <= numecs; ++i ) X { X tecbck[i] = i - 1; X tecfwd[i - 1] = i; X } X X tecfwd[numecs] = NIL; X } X } X X X/* mkdeftbl - make the default, "jam" table entries X * X * synopsis X * mkdeftbl(); X */ X Xvoid mkdeftbl() X X { X int i; X X jamstate = lastdfa + 1; X X ++tblend; /* room for transition on end-of-buffer character */ X X if ( tblend + numecs > current_max_xpairs ) X expand_nxt_chk(); X X /* add in default end-of-buffer transition */ X nxt[tblend] = end_of_buffer_state; X chk[tblend] = jamstate; X X for ( i = 1; i <= numecs; ++i ) X { X nxt[tblend + i] = 0; X chk[tblend + i] = jamstate; X } X X jambase = tblend; X X base[jamstate] = jambase; X def[jamstate] = 0; X X tblend += numecs; X ++numtemps; X } X X X/* mkentry - create base/def and nxt/chk entries for transition array X * X * synopsis X * int state[numchars + 1], numchars, statenum, deflink, totaltrans; X * mkentry( state, numchars, statenum, deflink, totaltrans ); X * X * "state" is a transition array "numchars" characters in size, "statenum" X * is the offset to be used into the base/def tables, and "deflink" is the X * entry to put in the "def" table entry. If "deflink" is equal to X * "JAMSTATE", then no attempt will be made to fit zero entries of "state" X * (i.e., jam entries) into the table. It is assumed that by linking to X * "JAMSTATE" they will be taken care of. In any case, entries in "state" X * marking transitions to "SAME_TRANS" are treated as though they will be X * taken care of by whereever "deflink" points. "totaltrans" is the total X * number of transitions out of the state. If it is below a certain threshold, X * the tables are searched for an interior spot that will accommodate the X * state array. X */ X Xvoid mkentry( state, numchars, statenum, deflink, totaltrans ) Xregister int *state; Xint numchars, statenum, deflink, totaltrans; X X { X register int minec, maxec, i, baseaddr; X int tblbase, tbllast; X X if ( totaltrans == 0 ) X { /* there are no out-transitions */ X if ( deflink == JAMSTATE ) X base[statenum] = JAMSTATE; X else X base[statenum] = 0; X X def[statenum] = deflink; X return; X } X X for ( minec = 1; minec <= numchars; ++minec ) X { X if ( state[minec] != SAME_TRANS ) X if ( state[minec] != 0 || deflink != JAMSTATE ) X break; X } X X if ( totaltrans == 1 ) X { X /* there's only one out-transition. Save it for later to fill X * in holes in the tables. X */ X stack1( statenum, minec, state[minec], deflink ); X return; X } X X for ( maxec = numchars; maxec > 0; --maxec ) X { X if ( state[maxec] != SAME_TRANS ) X if ( state[maxec] != 0 || deflink != JAMSTATE ) X break; X } X X /* Whether we try to fit the state table in the middle of the table X * entries we have already generated, or if we just take the state X * table at the end of the nxt/chk tables, we must make sure that we X * have a valid base address (i.e., non-negative). Note that not only are X * negative base addresses dangerous at run-time (because indexing the X * next array with one and a low-valued character might generate an X * array-out-of-bounds error message), but at compile-time negative X * base addresses denote TEMPLATES. X */ X X /* find the first transition of state that we need to worry about. */ X if ( totaltrans * 100 <= numchars * INTERIOR_FIT_PERCENTAGE ) X { /* attempt to squeeze it into the middle of the tabls */ X baseaddr = firstfree; X X while ( baseaddr < minec ) X { X /* using baseaddr would result in a negative base address below X * find the next free slot X */ X for ( ++baseaddr; chk[baseaddr] != 0; ++baseaddr ) X ; X } X X if ( baseaddr + maxec - minec >= current_max_xpairs ) X expand_nxt_chk(); X X for ( i = minec; i <= maxec; ++i ) X if ( state[i] != SAME_TRANS ) X if ( state[i] != 0 || deflink != JAMSTATE ) X if ( chk[baseaddr + i - minec] != 0 ) X { /* baseaddr unsuitable - find another */ X for ( ++baseaddr; X baseaddr < current_max_xpairs && X chk[baseaddr] != 0; X ++baseaddr ) X ; X X if ( baseaddr + maxec - minec >= current_max_xpairs ) X expand_nxt_chk(); X X /* reset the loop counter so we'll start all X * over again next time it's incremented X */ X X i = minec - 1; X } X } X X else X { X /* ensure that the base address we eventually generate is X * non-negative X */ X baseaddr = max( tblend + 1, minec ); X } X X tblbase = baseaddr - minec; X tbllast = tblbase + maxec; X X if ( tbllast >= current_max_xpairs ) X expand_nxt_chk(); X X base[statenum] = tblbase; X def[statenum] = deflink; X X for ( i = minec; i <= maxec; ++i ) X if ( state[i] != SAME_TRANS ) X if ( state[i] != 0 || deflink != JAMSTATE ) X { X nxt[tblbase + i] = state[i]; X chk[tblbase + i] = statenum; X } X X if ( baseaddr == firstfree ) X /* find next free slot in tables */ X for ( ++firstfree; chk[firstfree] != 0; ++firstfree ) X ; X X tblend = max( tblend, tbllast ); X } X X X/* mk1tbl - create table entries for a state (or state fragment) which X * has only one out-transition X * X * synopsis X * int state, sym, onenxt, onedef; X * mk1tbl( state, sym, onenxt, onedef ); X */ X Xvoid mk1tbl( state, sym, onenxt, onedef ) Xint state, sym, onenxt, onedef; X X { X if ( firstfree < sym ) X firstfree = sym; X X while ( chk[firstfree] != 0 ) X if ( ++firstfree >= current_max_xpairs ) X expand_nxt_chk(); X X base[state] = firstfree - sym; X def[state] = onedef; X chk[firstfree] = state; X nxt[firstfree] = onenxt; X X if ( firstfree > tblend ) X { X tblend = firstfree++; X X if ( firstfree >= current_max_xpairs ) X expand_nxt_chk(); X } X } X X X/* mkprot - create new proto entry X * X * synopsis X * int state[], statenum, comstate; X * mkprot( state, statenum, comstate ); X */ X Xvoid mkprot( state, statenum, comstate ) Xint state[], statenum, comstate; X X { X int i, slot, tblbase; X X if ( ++numprots >= MSP || numecs * numprots >= PROT_SAVE_SIZE ) X { X /* gotta make room for the new proto by dropping last entry in X * the queue X */ X slot = lastprot; X lastprot = protprev[lastprot]; X protnext[lastprot] = NIL; X } X X else X slot = numprots; X X protnext[slot] = firstprot; X X if ( firstprot != NIL ) X protprev[firstprot] = slot; X X firstprot = slot; X prottbl[slot] = statenum; X protcomst[slot] = comstate; X X /* copy state into save area so it can be compared with rapidly */ X tblbase = numecs * (slot - 1); X X for ( i = 1; i <= numecs; ++i ) X protsave[tblbase + i] = state[i]; X } X X X/* mktemplate - create a template entry based on a state, and connect the state X * to it X * X * synopsis X * int state[], statenum, comstate, totaltrans; X * mktemplate( state, statenum, comstate, totaltrans ); X */ X Xvoid mktemplate( state, statenum, comstate ) Xint state[], statenum, comstate; X X { X int i, numdiff, tmpbase, tmp[CSIZE + 1]; X Char transset[CSIZE + 1]; X int tsptr; X X ++numtemps; X X tsptr = 0; X X /* calculate where we will temporarily store the transition table X * of the template in the tnxt[] array. The final transition table X * gets created by cmptmps() X */ X X tmpbase = numtemps * numecs; X X if ( tmpbase + numecs >= current_max_template_xpairs ) X { X current_max_template_xpairs += MAX_TEMPLATE_XPAIRS_INCREMENT; X X ++num_reallocs; X X tnxt = reallocate_integer_array( tnxt, current_max_template_xpairs ); X } X X for ( i = 1; i <= numecs; ++i ) X if ( state[i] == 0 ) X tnxt[tmpbase + i] = 0; X else X { X transset[tsptr++] = i; X tnxt[tmpbase + i] = comstate; X } X X if ( usemecs ) X mkeccl( transset, tsptr, tecfwd, tecbck, numecs, 0 ); X X mkprot( tnxt + tmpbase, -numtemps, comstate ); X X /* we rely on the fact that mkprot adds things to the beginning X * of the proto queue X */ X X numdiff = tbldiff( state, firstprot, tmp ); X mkentry( tmp, numecs, statenum, -numtemps, numdiff ); X } X X X/* mv2front - move proto queue element to front of queue X * X * synopsis X * int qelm; X * mv2front( qelm ); X */ X Xvoid mv2front( qelm ) Xint qelm; X X { X if ( firstprot != qelm ) X { X if ( qelm == lastprot ) X lastprot = protprev[lastprot]; X X protnext[protprev[qelm]] = protnext[qelm]; X X if ( protnext[qelm] != NIL ) X protprev[protnext[qelm]] = protprev[qelm]; X X protprev[qelm] = NIL; X protnext[qelm] = firstprot; X protprev[firstprot] = qelm; X firstprot = qelm; X } X } X X X/* place_state - place a state into full speed transition table X * X * synopsis X * int *state, statenum, transnum; X * place_state( state, statenum, transnum ); X * X * State is the statenum'th state. It is indexed by equivalence class and X * gives the number of the state to enter for a given equivalence class. X * Transnum is the number of out-transitions for the state. X */ X Xvoid place_state( state, statenum, transnum ) Xint *state, statenum, transnum; X X { X register int i; X register int *state_ptr; X int position = find_table_space( state, transnum ); X X /* base is the table of start positions */ X base[statenum] = position; X X /* put in action number marker; this non-zero number makes sure that X * find_table_space() knows that this position in chk/nxt is taken X * and should not be used for another accepting number in another state X */ X chk[position - 1] = 1; X X /* put in end-of-buffer marker; this is for the same purposes as above */ X chk[position] = 1; X X /* place the state into chk and nxt */ X state_ptr = &state[1]; X X for ( i = 1; i <= numecs; ++i, ++state_ptr ) X if ( *state_ptr != 0 ) X { X chk[position + i] = i; X nxt[position + i] = *state_ptr; X } X X if ( position + numecs > tblend ) X tblend = position + numecs; X } X X X/* stack1 - save states with only one out-transition to be processed later X * X * synopsis X * int statenum, sym, nextstate, deflink; X * stack1( statenum, sym, nextstate, deflink ); X * X * if there's room for another state one the "one-transition" stack, the X * state is pushed onto it, to be processed later by mk1tbl. If there's X * no room, we process the sucker right now. X */ X Xvoid stack1( statenum, sym, nextstate, deflink ) Xint statenum, sym, nextstate, deflink; X X { X if ( onesp >= ONE_STACK_SIZE - 1 ) X mk1tbl( statenum, sym, nextstate, deflink ); X X else X { X ++onesp; X onestate[onesp] = statenum; X onesym[onesp] = sym; X onenext[onesp] = nextstate; X onedef[onesp] = deflink; X } X } X X X/* tbldiff - compute differences between two state tables X * X * synopsis X * int state[], pr, ext[]; X * int tbldiff, numdifferences; X * numdifferences = tbldiff( state, pr, ext ) X * X * "state" is the state array which is to be extracted from the pr'th X * proto. "pr" is both the number of the proto we are extracting from X * and an index into the save area where we can find the proto's complete X * state table. Each entry in "state" which differs from the corresponding X * entry of "pr" will appear in "ext". X * Entries which are the same in both "state" and "pr" will be marked X * as transitions to "SAME_TRANS" in "ext". The total number of differences X * between "state" and "pr" is returned as function value. Note that this X * number is "numecs" minus the number of "SAME_TRANS" entries in "ext". X */ X Xint tbldiff( state, pr, ext ) Xint state[], pr, ext[]; X X { X register int i, *sp = state, *ep = ext, *protp; X register int numdiff = 0; X X protp = &protsave[numecs * (pr - 1)]; X X for ( i = numecs; i > 0; --i ) X { X if ( *++protp == *++sp ) X *++ep = SAME_TRANS; X else X { X *++ep = *sp; X ++numdiff; X } X } X X return ( numdiff ); X } END_OF_FILE if test 25169 -ne `wc -c <'tblcmp.c'`; then echo shar: \"'tblcmp.c'\" unpacked with wrong size! fi # end of 'tblcmp.c' fi echo shar: End of archive 4 $of 13$. cp /dev/null ark4isdone MISSING="" for I in 1 2 3 4 5 6 7 8 9 10 11 12 13 ; do if test ! -f ark${I}isdone ; then MISSING="${MISSING} ${I}" fi done if test "${MISSING}" = "" ; then echo You have unpacked all 13 archives. rm -f ark[1-9]isdone ark[1-9][0-9]isdone else echo You still need to unpack the following archives: echo " " ${MISSING} fi ## End of shell archive. exit 0 -- Mail submissions (sources or binaries) to <amiga@uunet.uu.net>. Mail comments to the moderator at <amiga-request@uunet.uu.net>. Post requests for sources, and general discussion to comp.sys.amiga.